Incremental Learning of Affix Segmentation

نویسندگان

  • Wondwossen Mulugeta
  • Michael Gasser
  • Baye Yimam
چکیده

This paper presents a supervised machine learning approach to incrementally learn and segment affixes using generic background knowledge. We used Prolog script to split an affix from the Amharic word for further morphological analysis. Amharic, a Semitic language, has very complex inflectional and derivational verb morphology, with many possible prefixes and suffixes which are used to show various grammatical features. Further segmentation of the affixes into valid morphemes is a challenge addressed in this paper. The paper demonstrates how incremental and easy-to-complex examples can be used to learn such language constructs. The experiment revealed that affixes could be further segmented into valid prefixes and suffixes using a generic and robust string manipulation script by the help of intelligent teacher who presents examples in incremental order of complexity allowing the system to gradually build its knowledge. The system is able to do the segmentation with 0.94 Precision and 0.97 Recall rates.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards a Malay Derivational Lexicon: Learning Affixes Using Expectation Maximization

We propose an unsupervised training method to guide the learning of Malay derivational morphology from a set of morphological segmentations produced by a naı̈ve morphological analyzer. Using a morphology-based language model, we first estimate the probability of a given segmentation. We train the model with EM to find the segmentation that maximizes the probability of each morpheme. We extract t...

متن کامل

Morpheme Segmentation from Distributional Information

Morphology is the study of how meaningful components of form are combined to make complex words. Understanding how such complex words can be ‘broken apart’ into their morphological constituents is the problem of morpheme segmentation. While words that have similar meanings tend to share similar forms (e.g., run and running), many morphemes do not have transparently shared meanings. For example,...

متن کامل

From “Manbearpig” to “Man bear pig”: An Evaluation of Unsupervised Word Segmentation Algorithms

In this paper, we explore diverse methods of unsupervised morphemic segmentation. We test Successor and Predecessor Count algorithms, Entropy algorithms, and Affix Discovery algorithms. The paper examines word stemming based on these algorithms, and the influence of training corpus size on segmentation accuracy. We propose variations on these algorithms to improve overall efficacy. While these ...

متن کامل

The effects of segmentation and redundancy methods on cognitive load and vocabulary learning and comprehension of English lessons in a multimedia learning environment

The present study was conducted with the aim of the effects of segmentation and redundancy methods on cognitive load and vocabulary learning and comprehension of English lessons in a multimedia learning environment.The purpose of this study is an applied research and a real experimental study. The statistical population of the present study includes all people aged 14 to 16 who are enrolled in ...

متن کامل

Improving Word Alignment by Adjusting Chinese Word Segmentation

Most of the current Chinese word alignment tasks often adopt word segmentation systems firstly to identify words. However, word-mismatching problems exist between languages and will degrade the performance of word alignment. In this paper, we propose two unsupervised methods to adjust word segmentation to make the tokens 1-to-1 mapping as many as possible between the corresponding sentences. Th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012